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Abstract: Suppose that one observes pairs (xi,Yi), (3:2, V2). (x n ,Y n ), 
where x\ < X2 < ■ ■ ■ < fixed numbers, and Yi, Y2, ■ ■■ ,Y n are inde- 

pendent random variables with unknown distributions. The only assumption 
is that Median(Yi) = f(xi) for some unknown convex function /. We present 
a confidence band for this regression function / using suitable multiscale sign- 
tests. While the exact computation of this band requires 0(n 4 ) steps, good 
approximations can be obtained in 0(n 2 ) steps. In addition the confidence 
band is shown to have desirable asymptotic properties as the sample size n 
tends to infinity. 

1. Introduction 

Suppose that we are given data vectors x, Y £ R™, where x is a fixed vector with 
components x\ < X2 < • • ■ < x n , and Y has independent components Yi with 
unknown distributions. We assume that 

(1) Median^) = /(a*) 

for some unknown convex function /:!->!, where R denotes the extended real 
line [—00,00]. To be precise, we assume that f(xi) is some median of Yi. In what 
follows we present a confidence band (L,U) for /. That means, L — L(- |x, Y, a) 
and U = U(- | x, Y, a) are data-dependent functions from R into R such that 

(2) P (L(x) < f{x) < U(x) for all x £ r) > 1 - a 

for a given level a £ (0, 1). 

Our confidence sets are based on a multiscale sign-test. A similar method has 
been applied by Diimbgen and Johns Q to treat the case of isotonic regression func- 
tions, and the reader is referred to that paper for further references. The remainder 
of the present paper is organized as follows: Section [2] contains the explicit defi- 
nition of our sign-test statistic and provides some critical values. A corresponding 
confidence band (L, U) is described in Section[3] This includes exact algorithms for 
the computation of the upper bound U and the lower bound L whose running time 
is of order 0(n A ) and 0(n 3 ), respectively. For large data sets these computational 
complexities are certainly too high. Therefore we present approximate solutions in 
Section 2] whose running time is of order 0(n 2 ). In Section [5] we discuss the asymp- 
totic behavior of the width of our confidence band as the sample size n tends to 
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infinity. Finally, in Section [6] we illustrate our methods with simulated and real 
data. 

Explicit computer code (in MatLab) for the procedures of the present paper as 
well as of Diimbgen and Johns [2| may be downloaded from the author's homepage. 

2. Definition of the test statistic 

Given any candidate g : R — ► R for / we consider the sign vectors sign(Y — g(x)) 
and sign(g(x) — Y), where g(x) := (g(xi))Y = i and 

sign(x) := l{x > 0} - l{x < 0} for 

sign(v) := (sign(ui))" =1 for v = (uj)? =1 G R n . 

This non-symmetric definition of the sign function is necessary in order to deal 
with possibly non-continuous distributions. Whenever the vector sign(Y — <?(x)) or 
sign(<?(x) — Y) contains "too many" ones in some region, the function g is rejected. 
Our confidence set for / comprises all convex functions g which are not rejected. 

Precisely, let T Q : { — 1, 1}™ — > R be some test statistic such that T (cr) < T (<x) 
whenever er < <r component- wise. Then we define 

T(v) := max{T (sign(v)),r (sign(-v))) 

for v G R™. Let £ G {—1, l} n be a Rademacher vector, i.e. a random vector with 
independent components & which are uniformly distributed on { — 1, 1}. Further let 
K = n(n,a) be the smallest (1 — a)-quantile of T(£). Then 

P (T(Y - /(x)) < k) > P(T(^) <k) > I -a; 

see Diimbgen and Johns [2j. Consequently the set 

C(x,Y,a) := {convex g : T(Y - ,g(x)) < n} 

contains / with probability at least 1 — a. 

As for the test statistic T a , let V be the triangular kernel function given by 

ijj(x) :— max(l — \x\, 0). 

Then we define 

T (er) := max [ max TdAo) — t( — — - )| , 

d=i,...,L(n+i)/2j \i=i,...,n V n J J 

where 

I» :=(21og(e/ W )) 1 / 2 , 

n . . d—1 . 2 —1/2 

T dij (a) :=p d J2^(^-)(T i with^ := ( ^ ^(J) ) • 

i— 1 i—l—d 

Note that Tdj(cr) is measuring whether (<Ji)j^d<i<j+d contains suspiciously many 
ones. Thus d and j can be viewed as scale and location parameter, respectively. The 
normalizing constant pd is chosen such that the standard deviation of Td.j(£) is not 
greater than one, with equality if d < j < n + 1 — d. The additive correction term 
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Table 1 
Critical values n(n, a) 



Sample size n 



a 


100 


200 


300 


500 


700 


1000 


2000 


5000 


10000 


0.50 
0.10 
0.05 


0.054 
0.792 
1.035 


0.124 
0.860 
1.102 


0.152 
0.867 
1.102 


0.188 
0.904 
1.135 


0.216 
0.902 
1.136 


0.232 
0.915 
1.152 


0.279 
0.970 
1.229 


0.333 
0.991 
1.231 


0.362 
1.021 
1.246 



r((2d — is justified by results of Diimbgen and Spokoiny Q about multiscale 

testing. In fact, Theorem 6.1 of Diimbgen and Spokoiny 3] and Donsker's invariance 
principle for partial sums of the Rademacher vector £ together imply that the 
distribution of T(£) converges weakly to a probability distribution on [0, oo) as 

n — > oo. 

Explicit formulae for quantiles of the limiting distribution of T(£) are not avail- 
able. Therefore we list some quantiles of T(£) for various values of n and a in 
Table [TJ Each quantile has been estimated in 19999 Monte Carlo simulations. 

3. Definition and exact computation of a band 

In principle one could define a confidence band (L, U) via 

L:=mi{g € C(x,Y,a)} 

= inf {convex g : T D (sign(Y — .g(x)) < «, r o ( sign (ff(x) — Y) < k} , 
U := supjff e C(x, Y, a) } 

= sup {convex g : T Q (sign(Y — g(x)) < «, T (sign(g(x) — Y) < k} . 

Throughout this paper maxima or minima of functions are defined pointwise. Un- 
fortunately, the explicit computation of (L, U) is far from trivial. Therefore we 
modify the latter definition and compute a band (L, U) in two steps. Our upper 
boundary is given by 

tj := max {convex g : T Q (sign(g(x) — Y) < k} . 

Thus we just drop the constraint T (sign(Y — <?(x))) < n in the definition of U and 
obtain U > U. With U at hand, our lower boundary is defined as 

L := min |convex g : g < U, T D (sign(Y — S'(x))) < k| . 

Here we replace the constraint T (sign(g(x) — Y)) < k in the definition of L with 

the weaker constrint g < U and obtain L < L. In what follows we concentrate 
on the computation of the corresponding vectors L = (Li)°l 1 = L(x) and U = 
(Ui)U = C>(x). 

3. 1 . Computation of U 

A simplified expression for U. To determine U it suffices to consider the class 
Q consisting of the following convex functions g^ k : For 1 < j < k < n with Xj < 
define 

Y k — Y 
9jA x ) '■= Y i + — — ( x 
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describing a straight line connecting the data points (xj , Yj) and (xk, Ifc). Moreover, 
for j, k € {1, . . . , n} let 




oo if x < Xk, 
Y k if x = x k , 
— oo if x > Xfc. 

— oo if x < Xj, 



oo if x > Xj . 



Then 



(3) U = max { 9 : 9 eg, T (sign(.g(x) - Y)) < k) . 

For let g be any convex function such that T D (sign(g(x) — Y)) < k. Let g be the 
largest convex function such that g(xi) < Yi for all indices i with ^(ajj) < Yi. 
This function g is closely related to the convex hull of all data points (a;,, Yi) with 
g{ Xi ) < Yi. Obviously, g > g and T (sign(3(x) - Y)) = T (sign( 5 (x) - Y)). Let 
u>(l) < - • • < uj(m) be indices such that x^m < ■ ■ ■ < x u ( m \ and 

{(x,g(x)):xem}n{(xi,Yi):l<i<n} = Y w[i) ) : 1 < t < m}. 

With lo(0) := and u>(m + 1) := n + 1 one may write g as the maximum of 
the functions 1 < ^ < m + 1, all of which satisfy the inequality 

T (sign(g u ( t _i)^ ( i) (x) - Y)) < T p (sign(g(x) - Y)) < k. Figure [T] illustrates these 
considerations. 

Computational complexity. As we shall explain in Section 13. 3[ the computa- 
tion of T (sign((7(x) — Y)) for one single candidate function g e Q requires 0(n 2 ) 

steps. In case of T (sign(g(x) — Y)) < k we have to replace TJ with the vector 

(meoc(g(xi), Uij) in another 0(n) steps. Consequently, since Q contains at most 

n(n — l)/2 + 2n = 0(n 2 ) functions, the computation of U requires 0(n 4 ) steps. 




Fig 1. A function g and its associated function g. 
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3. 2. Computation of L 

From now on we assume that U is nontrivial, i.e. that Ui = U(xi) < oo for some 
value Xi. Moreover, letting a; m ; n and x max be the smallest and largest such value, 
we assume that x m - m < x max . Finally let T (sign(Y — E/(x))) < k. Otherwise the 
confidence set C(x, Y, a) would be empty, meaning that convexity of the median 
function is not plausible. 
Simplified formulae for L. 

Similarly as in the previous section, one may replace the set of all convex func- 
tions with a finite subset H = 7i(U). First of all let h be any convex function such 
that h <U and T Q (sign(Y — h(x))) < n. For any real number t let z :— h(t). Now 

let h — ht :Z be the largest convex function such that h < U and h(t) = z. Obviously 
h > h, whence T G (sign(Y — h{x))) < K. Consequently, 

(4) L(t) = inf {zeR: T (sign(Y - /i M (x))) < K } . 

Figure [2] illustrates the definition of h t . z . Note that h t . z is given by the convex hull 
of the point (t,z) and the epigraph of U, i.e. the set of all pairs (x,y) £ M. 2 such 
that U{x) < y. 

Starting from equation (U|) we derive a computable expression for L. For that 
purpose we define tangent parameters as follows: Let J be the set of all indices 
n} such that U(xj) > Yj. For j £ J define 




Fig 2. The extremal function ht y z of two points (t, z). 
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00 

. U(x k ) - Yj 
nm — - — 

Xk~^> Xj X X j 



if Xj ^ ; 

else, 



if 'X j *^I3 



argmm else. 

x k >Xj Xk Xj 

With these parameters we define auxiliary tangent functions 



h)(x) 



U{x) 



if x < a\ 



Yj + Sj (x ~ Xj ) if x > a' 



_ ( Yj + Sj (x — Xj) ii x < a r j 



U(x 



if x > a T j. 



Figure [3] depicts these functions h\ and hL. Note that 



h)(x) 



h)(x) 




^h(x) : h convex, h <U, h{xj) < YjX if x < Xj, 
|/i(.t) : h convex, h <U, h(xj) > Yj | if x > 

in : h convex, h <U, h{xj) >Yj\ if x < Xj, 

^h(x) : h convex, h <U, h(xj) <Yjjiix>Xj, 



In particular, hAx$) — hj(xj) = Yj. In addition we define h (x) := 
Then we set 



3,k 



maxQiLh T k ) and W := {h jjk ■ j G {0} U J, ft 6 <7 U {n + 1}}. 



This class TL consists of at most (n + l) 2 functions, and elementary considerations 
show that 



(5) 



L = min [heH : T Q (sign(Y - h(x))) < k} 





Fig 3. The tangent functions and h T k . 
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Computational complexity. Note first that any pair (aj, sj) may be computed 
in 0(n) steps. Consequently, before starting with L we may compute all tangent 
parameters in time 0(n 2 ). Then Equation ([5]) implies that L may be computed in 
0(n 4 ) steps. However, this can be improved considerably. The reason is, roughly 
saying, that for fixed j, one can determine the smallest function h r k such that 
T (sign(Y — hj,k(^))) < ft m 0{n 2 ) steps, as explained in the subsequent section. 
Hence a proper implementation lets us compute L in 0(n 3 ) steps. 

3.3. An auxiliary routine 

In this section we show that the value of T (tr) can be computed in 0(n 2 ) steps. 
More generally, we consider n-dimensional sign vectors cr^\ ct^ 1 ', . . . , tr^ q ' such 
that for 1 < I < q the vectors <j™~ x ' and <x^ differ exactly in one component, say, 

cr /lP = 1 and cr^),\ = —1 

for some index u{t) G {1, . . . , n}. Thus er(°) > erW > • • • > er^ component- wise. 
In particular, T (cr^) is non-increasing in t. It is possible to determine the number 

4 := min (ll e {0, . . . , q} : T a {a^) < k} U {oo}) 

in 0{n 2 ) steps as follows: 

Algorithm. We use three vector variables S, and plus two scalar variables 
£ and d. While running the algorithm the variable S contains the current vector 
erW, while 



= ( J2 s - 

i£[j-d+l.]+d-l] 



ie[j-d+i,j+d-i] 



Initialisation. 




Induction step. Check whether 

d-l 



(6) . max sf 5 < ( £ (d - z) 2 ) ^ (T((2d - l)/n) + ft) 

z— l,...,n V * — ' / 

i=l-d 

= {{2d 2 + l)d/3) 1/2 (r((2d - l)/n) + k). 
• If © is fulfilled and d < \_{n + l)/2j , then 
d<-d + l, 

^ (0) + Si+d-l for * < d, 



<sf 3 <- \ sf) + + for d < i < n + 1 - d, 



^ (0) + S i+1 - d for * > n + 1 - d, 
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If © is fulfilled and d = [(n + l)/2j, then 
• If (HI) is violated and £ < q, then 





-1 + 1, 




--1, 




_ s (o>_ 




- - 



• If Condition (|6|) is violated but I — q, then T (<tW) > k 5 and 



As for the running time of this algorithm, note that each induction step requires 
0(n) operations. Since either d or I increases each time by one, the algorithm 
terminates after at most n + q+ l<2n+l induction steps. Together with 0(n) 
operations for the initialisation we end up with total running time 0(n 2 ). 



4. Approximate solutions 

Approximation of U . Recall that the exact computation of U involves testing 
whether a straight line given by a function g(-) and touching one or two data 
points (xi,Yi) satisfies the inequality T G (sign(g(x) — Y)) < k. The idea of our 
approximation is to restrict our attention to straight lines whose slope belongs to 
a given finite set. 

Step 1. At first we consider the straight lines go.fe instroduced in section [57X1 all 
having slope — oo. Let u>(l), . . . , u>(ri) be a list of {1, . . . , n} such that <?o,w(i) 
9o,u(n)- I n other words, for 1 < £ < n either x u a_i\ < x u ^, or x u (c_\\ = x u rg\ and 
Yu>(t-i) _• Yui(e)- With the auxiliary procedure of Section f3T3l we can determine the 
the smallest number £„ such that T o (sign(<7 0)W ,^(x) — Y)) < K in 0(n 2 ) steps. We 

write Go := g y Note that i s equal to x m [ n = min{x : U[x) < oo}. 

Step 2. For any given slope s € R let a(s) be the largest real number such that 
the sign vector 

tr(s) := (sign(Ki - a(s) - s^))" =1 

satisfies the inequality T (er(s)) < n. This number can also be determined in time 
0(n 2 ). This time we have to generate and use a list w(l), . . . , u>(n) of {1,2,..., n} 
such that Y u (f\ — sx^^) is non-increasing in £. 

Now we determine the numbers a(s\), . . . , ci(sm-i) for given slopes s\ < ■ ■ ■ < 
sm-i- Then wc define 

Gi(x) := a(s e ) + s e x for 1 < £ < M. 

Step 3. Finally we determine the largest function Gm among the degenerate 
linear functions <7i, n +i, ■ ■ ■ , 9n,n+x such that T (sign(Gjw-(x) — Y)) < K. This is 
analogous to Step 1 and yields the number x max = max{i : U(x) < oo}. 

Step 4. By means of this list of finitely many straight lines Go, Gi, . . . , Gm one 
obtains the lower bound U* := max(Go, Gi, . . . , Gm) for U. In fact, one could even 
replace G^ with the largest convex function Gi such that Gg{xi) < Yi whenever 
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Gi{xi) < Yi. Each of these functions can be computed via a suitable variant of the 
pool-adjacent- violators algorithm in 0(n) steps; see Robertson et al. [6|. 

Step 5. To obtain an upper bound U* for tl, for 1 < £ < M let Hi be the 
smallest concave function such that Hi(xi) > Yi whenever max(G£_i(cci), Gi(xi)) > 
Yi. Again Hg may be determined via the pool- adjacent- violators algorithm. Then 
elementary considerations show that 

U < U* := max (u*, Hi, H 2 ,...,H M ) ■ 

All in all, these five steps require 0(Mn?) steps. By visual inpection of these two 
curves [/» and U* one may opt for a refined grid of slopes or use U* as a surrogate 
for U. 

Approximation of L. Recall that the exact computation amounts to fixing any 
function h}^ und finding the smallest function h\ such that T (sign(Y— ^-^(x))) < k. 
Now approximations may be obtained by picking only a subset of the potential in- 
dices j. In addition, one may fix some functions h r k and look for the smallest 
satisfying the constraint T D (sign(Y — hj .^(x))) < k. Again this leads to approxima- 
tions L* and L* for L such that L* < L < L*. 



5. Asymptotic properties 

In this section we consider a triangular array of observations Xi = x n ,i and Yi = Y n ,i. 
Our confidence band (L, U) will be shown to have certain consistency properties, 
provided that / satisfies some smoothness condition, and that the following two 
requirements are met for some constants —oo<a<b<oo: 

(Al) Let M n denote the empirical distribution of the design points x n ^. That 
means, M n (B) := : x ni € B} for fici There is a constant c > such 

that 

,. . , M n [a n ,b n ] 

Jim ml — > c 

n->oo b n — a n 

whenever a < a n < b n < b and liminf n ^oo log(&„ — a ra )/logn > — 1. 

(A2) All variables Yi = Y n ^ with x n ^ S [a, b] satisfy the following inequalities: 

P(y i>Ml -r)/ - 2 toranyr>0, 
where H is some fixed function on [0, oo] such that 

hm > 0. 

These conditions (Al) and (A2) are satisfied in various standard models, as 
pointed out by Diimbgen and Johns Q. 

Theorem 1. Suppose that assumptions (Al) and (A2) hold. 
(a) Let f be linear on [a, b]. Then for arbitrary a < a' < b' < b, 

sup (f(x) - L{x)) 
sup (U(x)-f(x)) 

x£[a' ,b'] 
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(b) Let f be Holder continuous on [a, b] with exponent [3 G (1, 2]. That means, f is 
differentiable on [a,b] such that for some constant L > and arbitrary x,y 6 [a, b], 

\f'{x)-f'{y)\ < L\x-yf-\ 
Then for p n := log(n + l)/n and <J„ := pl/^ , 



sup (/(a;) - L(x)) 

scG[a,6] 

sup (J7(x)-/(x)) + 

a;e[a+5 n ,6— 5 ra ] 



= o P (^ /(2/m) ) 



Part (a) of this theorem explains the empirical findings in Section [S] that the 
band (L, C7) performs particularly well in regions where the regression function / 
is linear. 

Proof of TheoremQl step I. At first we prove the assertions about U. Note that 
for arbitrary t, z G K with z < U(t) there exist parameters //, v G M such that 
z = /i + !/t and 

n 

Sd,j(ji,v) ■■= XI si s n (M + -y%) 

2=1 

(7) < J g-^ r (^^)+«) for any (d,j)GT n ; 

here 7^ denotes the set of all pairs (d, j) of integers d > 0, j & [d, n+l—d]. Therefore 
it is crucial to have good simultaneous upper bounds for | S d j {(X, v) — T,dj (Mj v ) \ i 
where 



V dd {pL,v) := RS^fav) = J2^(—)( 2P ( Y i<» + l/x i)- 1 )- 

i=l 

One may write Sdji/J*, v) = J g d j ^ u d^ n with the random measure 

n 
i=l 

and the function 

{i,x,y) h-> gdj^yji, x, y) := ^(—-f-) si S n (M + vx - y) G [-1,1] 

on M 3 . The family of all these functions g d ■ „ is easily shown to be a Vapnik-Cerv- 
onenkis subgraph class in the sense of van der Vaart and Wellner |7j ■ Moreover, \l/ n 
is a sum of n stochastically independent random probability measures. Thus well- 
known results from empirical process theory (cf. Pollard [B[ ) imply that for arbitrary 
77 > 0, 



\Sd,j([i, v) - Ed,j(fJ>,f)\ >n 1/2 r] 

(8) < CcM-v 2 /c), 



sup 



sup \Sd,j(p, v) — Ed,j(Y, /j, t/)| > d 1 / 2 ?/ for some G T„ 



2 , 



(9) < Cexp(21ogn-?77C), 
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where C > 1 is a universal constant. Consequently, for any fixed a' > there is a 
constant C > such that the following inequalities are satisfied simultaneously for 
arbitrary (d,j) £ T n and ^) 6 K 2 with probability at least 1 — a': 

f Cn 1/2 

(10) \S^,v)-^iji,v)\ < [c d Vnog{n + l) l '\ 

In what follows we assume (| 10[) for some fixed C. 

Proof of part (a) for U. Suppose that / is linear on [a, b], and let [a', b'] C (a, b). 
By convexity of U, the maximum of U — f over [a', b'} is attained at a' or b' . We 
consider the first case and assume that U(a') > f(a') + e n for some e n > 0. Then 
there exist /j,, v £ M. satisfying ([7]) such that pL + va! = f(a') + e n and v < f'(a'). In 
particular, /i + vx — f(x) > e n for all x e [a, a'}. Now we pick a pair (d n ,j n ) G 7^ 
with d n as large as possible such that 

[ x jn-d n +i> x j n +d„-i\ c [a, a']. 
Assumption (Al) implies that d n > (c/2 + o(l))n. Now 

jn+dn-l . _ . 

by assumption (A2). Combining this inequality with ([7]) and (fT0|) yields 

(11) /3 d - 1 (r(^-i)+«) > d n F(e n )-Cn^. 

But = 3- 1/2 {2d - I) 1 ' 2 + Oid- 1 / 2 ), and x h-> x 1 ' 2 T{x) is non-decreasing on 
(0, 1]. Hence (fTTj) implies that 

H(e n ) < d~ l ((3" 1 / 2 + (l))(2d„ - l)V2( r (^!i_I) + K ) + On'/ 2 ) 

<d- 1 n 1 / 2 ((3- 1 / 2 + (l))(r(l) + K ) + C) 
= 0(n- 1 / 2 ). 

Consequently, e„ = C^n" 1 / 2 ). 

Proof of part (b) /or [/. Now suppose that /' is Holder-continuous on [a, b] with 
exponent [3 — 1 € (0,1] and constant L > 0. Let f7(:r) > f(x) + e n for some 
x G [a + <5 n , b — 5 n ] and e„ > 0. Then there are numbers /j,, v € K satisfying ([7]) such 
that pL + vx = f(x) + e„. Let (d n ,j n ) 6 7^ with <i„ as large as possible such that 
either 

/'(a;) < and [a^-^+i, x j n +d„-i] c fo^ + ^n], 

or 

/'(z) > v and [^-dn+i'Sn+dn-i] C [x-S n ,x]. 
Assumption (Al) implies that 

d n > (c/2 + o(l))S n n. 
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Moreover, for any i e {j n — d n + l,j n + d n — 1}, 

H + vxi - f(xi) =e n + I {y- f'(t)) dt 



>e n + (f'(x)-f'(t))dt 



>e n -L / s^ds 



0(C). 



so that 



£<J„,i„(/^) > d n H((e n -0(SP))+). 
Combining this inequality with (|7|) and (fTOf yields 

ff(( £ „-o(0) + ) <^ 1 /? d - 1 (r(^ r ^) +«) +crf„- 1 / 2log(n + 1) i/ 2 

< d- 1 p- n 1 (2 1 / 2 log^ 2 + k) + Cd- 1 ' 2 log(n + l) 1 ' 2 

(12) = 0(#). 

This entails that e„ has to be of order 0(tf£) = 0(p f l /[2f3+1) ) . □ 

Proof of Theorem^ step II. Now we turn our attention to L. For that purpose we 
change the definition of Sd,j{-, •) and Ed 3 -(-, •) as follows: Let U n be a fixed convex 
function to be specified later. Then for (t, z) £ R 2 we define h n ,t,z to be the largest 
convex function h such that h < U„ and < z. This definition is similar to the 
definition of ht, z in Section l3~2l Indeed, if U < U n and L(t) < z, then 

n 

S d ,j(t,z) :=y^jp(^—^-) signjY, - h n , t ,z(xi)) 

i=l 

(13) < /3" 1 (r(^-ll) + «) for any (d,j) e T n . 
Here we set 

£d,j(M) := E5ij(*,*) = ^V'(^^)(2P(K i >/i„, 4 , 2 (x l ))-l). 

i=l 

Again we may and do assume that (|10[) is true for some constant C. 

Proof of part (a) for L. Suppose that / is linear on [a, b]. We define U n (x) := 
f(x) + ^rT 1 ! 2 + l{x [a', 6']}oo with constants 7 > and a < a' < b' < b. Since 
liminf n _,oo P(U < U n ) tends to one as 7 — > 00, we may assume that U < U n . 
Suppose that L(t) < z := f(t) — 2e„ for some t G [a, b] and e n > 7n -1 / 2 . A simple 
geometrical consideration shows that fr. n ,i,z < / — e« on an interval [a", b"] C [a, 6] of 
length b" — a" > (b' — a')/3. If we pick (d n ,j n ) £ 7^ with d„ as large as possible such 
that [xj n -d n +i, x j n +d n -i] C [a", 6"], then d„ > (c(b' — a')/6 + o(l))n. Moreover, 
([HD and (JTDJ) entail $TTJ), whence e„ = C^n" 1 / 2 ). 

Proof of part (b) /or L. Now suppose that /' is Holder-continuous on [a, 6] with 
exponent [3 — 1 € (0, 1] and constant L > 0. Here we define U n {x) :— f{x) + 70"^ + 
l{x g" [a + S n ,b — 6 n ]}oo with a constant 7 > 0, and we assume that U < U n . 
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Suppose that L(t) < z ;= f(t) — 2e„ for some t £ [a,b] and e n > 0. If t < b — 25 n , 
then 

+ XS n) <Z + X(U n (t + S n ) - Z) 

= f(t) - 2e n + X(f{t + S n ) - f(t) + 2e„ + 7^) 

= f(t) - 2(1 - A)e„ + A / f'(t + s) ds + Xjd? 
Jo 

for < A < 1. Thus 

f(t + XS n ) - h n . t . z (t + X6 n ) 

= 2(1 - A)e„ + A / (f(t + As) - f'(t + a)) ds - A 7 ^ 
Jo 
r s n 

> 2(1 - X)e n - A / L(l - A) /3 - 1 s /3 - 1 ds - A 7 <^ 
Jo 

uniformly for < A < 1/2. Analogous arguments apply in the case t > a + 2S n . 
Consequently there is an interval [o„, b n ] C [a, b] of length (5„/2 such that f—h ni t, z > 
— O(SI^), provided that a + 25 n < b — 2S n . Again we choose (d n ,j n ) £ T n with 
d n as large as possible such that [xj n -d n +i, Xj n C [a n ,b n ]. Then d n > (c/4 + 

o(l))<5„n, and ((131) and (HHJ) lead to (HI]). Thus e„ = O(^) = 0(^ /(2/3+1) ). □ 



6. Numerical examples 

At first we illustrate the confidence band (L, U) defined in Section [3] with some 
simulated data. Precisely, we generated 

Yi = f(xi) + ae t 

with Xi := (i — 1/2) /n, n = 500 and 

, . J -12(x- 1/3) if x < 1/3, 
JW :- |(27/2)(x-l/3) 2 ifx>l/3. 

Moreover, ct = 1/2, and the random errors ei,...,e„ have been simulated from 
a student distribution with five degrees of freedom. Figure 0] depicts these data 
together with the corresponding 95%-confidence band (L, U) and / itself. Note 
that the width of the band is smallest near the center of the interval (0, 1/3) on 
which / is linear. This is in accordance with part (a) of Theorem [T] 

Secondly we applied our procedure to a dataset containing the income Xi and 
the expenditure Yi for food in the year 1973 for n = 7125 households in Great 
Britain (Family Expenditure Survey 1968-1983). This dataset has also been ana- 
lyzed by Hardle and Marron 0]. They computed simultaneous confidence intervals 
for E(Yj) = f(xi) by means of kernel estimators and bootstrap methods. Figure [5] 
depicts the data. In order to enhance the main portion, the axes have been chosen 
such that 72 outlying observations are excluded from the display. Figure [6] shows a 
95%-confidence band for the isotonic median function /, as described by Diimbgen 
and Johns Q. Figure [7] shows a 95%-confidence band for the concave median func- 
tion /, as described in the present paper. Note that the latter band has substantially 
smaller width than the former one. This is in accordance with our theoretical results 
about rates of convergence. 
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Fig 4. Simulated data and 95%-confidence band (L,U), where n = 500. 




Fig 5. Income- expenditure data. 
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Fig 7. 95% -confidence band for concave median function. 
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